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Abstract 


We estimate a model of strategic voting and quantify the impact it has on election 
outcomes. Because the model exhibits multiplicity of outcomes, we adopt a set esti- 
mator. Using Japanese general-election data, we find a large fraction [75.3%, 80.3%] of 
strategic voters, only a small fraction [2.4%, 5.5%] of whom voted for a candidate other 
than the one they most preferred (misaligned voting). Existing empirical literature has 
not distinguished between the two, estimating misaligned voting instead of strategic 
voting. Accordingly, while our estimate of strategic voting is high, our estimate of 


misaligned voting is comparable to previous studies. 
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1 Introduction 


Strategic voting in elections has been of interest to researchers since Duverger (1954) and 
Downs (1957). Models of strategic voting are fundamental to the study of political economy, 
and have been used to investigate topics ranging from performance of different electoral 
rules to information aggregation in elections. Whether voters actually behave strategically, 
however, is an empirical question. 

Strategic voting is also of interest to politicians and voters. It is widely believed that if 
Ralph Nader had not run in the 2000 U.S. Presidential election, Al Gore would have won the 
election. The presence of minor candidates and third parties affects election outcomes, and 
the extent of that effect depends heavily on the fraction and behavior of strategic voters. 

In this paper, we study how to identify and estimate a model of strategic voting and 
quantify the impact strategic voting has on election outcomes by adopting an inequality- 
based estimator. We estimate the model using aggregate municipality level data from the 
Japanese general election which uses plurality rule. In our counterfactual policy experiments, 
we investigate election outcomes under alternative electoral rules. Strategic voters are defined 
as those who make voting decisions conditioning on the event that their votes are pivotal. 
Unlike sincere voters who always vote according to their preferences, strategic voters do not 
necessarily vote for their most preferred candidate in plurality-rule elections with three or 
more candidates.! 

In our paper, we make a clear distinction between strategic voting, as defined above (this 
is the standard definition in the theoretical literature”), and voting for a candidate other 
than the one the voter most prefers (hereafter referred to as misaligned voting). Strategic 
voters may vote for their most preferred candidate or they may not. Hence, the set of voters 
who engage in misaligned voting is only a subset of the set of strategic voters. Existing 
empirical literature has not distinguished between the two. In fact, previous attempts at 
estimating strategic voting have estimated misaligned voting instead of strategic voting. This 
distinction is important because the fraction of strategic voters is a model primitive while 
misaligned voting is an equilibrium object. In our paper we recover the extent of strategic 
voting, which allows us to conduct counterfactual policy experiments. 

Our model is an adaptation of Myerson and Weber (1993) and Myerson (2002) with the 


addition of sincere voters.® We relax the equilibrium requirement that Myerson and Weber 


‘There are other behavioral models of voting, such as expressive voting (voters may vote for a candidate 
to send a signal). We focus on sincere voting and strategic voting, which have been the main focus of the 
emipirical literature. 

See, e.g., the entry of “strategic voting” in The New Palgrave Dictionary of Economics by Feddersen 
(2008). 

3Our model can be naturally extended to elections with N candidates competing for Ng (Ng < N) seats 
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place on voters’ beliefs on pivot probabilities. We use a weaker solution concept so that 
the outcome of the model is robust to different assumptions regarding voter beliefs and can 
better account for diverse patterns of outcome as observed in the data.’ The consequence 
of adopting a weaker solution concept is that we have to deal with the issue of multiplicity 
of solution outcomes in identification and estimation. 

The key source of the multiplicity of the solution outcomes - and hence the key source of 
difficulty in the identification of the model - is the presence of strategic voters. The difficulty 
stems from the fact that preference and voting behavior do not necessarily have a one-to-one 
correspondence for strategic voters. Our identification argument proceeds in three steps. 

First we derive restrictions in terms of how preferences, which we write as a function of 
demographic characteristics, relate to voting behavior at the individual level. Unlike in other 
applications of discrete-choice models, the fact that a voter votes for candidate A does not 
imply that the voter preferred candidate A most. It could well be that the voter preferred 
candidate B over A, but voted for A instead because the voter believed that candidate B 
had little chance of winning. However, we can infer from the voter’s behavior that the voter 
did not rank candidate A last in his order of preference. It is a weakly dominated strategy 
for all voters, sincere and strategic, to vote for their least preferred candidate. 

Second, we relate aggregate variation in the vote share to demographic characteristics 
using two particular features often found in general-election data. The first feature is that 
general-election data typically consists of data from many elections taking place simultane- 
ously (e.g., 646 elections for House of Commons in U.K., 435 elections for U.S. House of 
Representatives). This feature is essential for identification and estimation because we take 
each election to be our unit of observation. The second feature is that the breakdown of votes 
and demographic characteristics within each electoral district is available. (e.g., county-level 
breakdown of votes for U.S. Congressional Elections). This data structure allows us to relate 
variation in the vote share to variation in the demographic characteristics within a single 
electoral district, holding constant common components such as beliefs over tie probabili- 
ties and candidate characteristics. This partially identifies the preference parameters. (For 
the rest of the paper, we use the term “municipality” to denote the sub-district within an 
electoral district, such as counties. Note that several municipalities comprise one “district” , 
which in turn corresponds to one election. See Figure 1.) 

Lastly, we consider identification of the extent of strategic voting. Intuitively, the varia- 
tion in the data that we would like to exploit is the variation in the voting outcome among 


municipalities (in different districts) with similar characteristics vis-a-vis the variation in the 


under single non-transferrable voting as in Cox (1994). 
4See footnote 12 for details. 
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Figure 1: Data Structure. The district is our unit of observation, each of which is comprised 
of multiple municipalities. Breakdown of data is available at the municipality level. 


vote shares and characteristics of other municipalities in the same district. For example, 
consider two liberal municipalities, one in a generally conservative electoral district and the 
other in a generally liberal district. Suppose that there are three candidates, a liberal, a 
centrist and a conservative candidate in both districts. If there are no strategic voters, we 
would not expect the voting outcome to differ across the two municipalities. However, in 
the presence of strategic voters, the voting outcome in these two municipalities could differ. 
If the strategic voters of the municipality in the conservative district believe that the liberal 
candidate has little chance of winning, those voters would vote for the centrist candidate, 
while strategic voters in the other municipality (in the liberal district) would vote for the 
liberal candidate according to their preferences (if they believe that the liberal candidate has 
a high chance of winning). 

More generally, given the preference parameters, the model can predict what the vote 
share would be in each municipality if all of the voters voted according to their preferences. If 
there were no strategic voters, the difference between the actual outcome and the predicted 
sincere-voting outcome would only be due to random shocks. However, when there is a 
large number of strategic voters, the actual vote share can systematically diverge from the 
predicted outcome. This is due to the multiplicity of equilibria induced by strategic voters. 
Recall that strategic voters make voting decisions conditional on the event that their votes 
are pivotal. If the beliefs regarding the probability of being pivotal differ across electoral 
districts — and we have no reason to believe that they do not — the behavior of strategic 
voters will also differ across districts. This corresponds to different equilibria being played in 


different districts. Of course it is impossible to directly test for the relationship between voter 


behavior and voter beliefs regarding tie probabilities as beliefs are unobservable. However, 
we can still use the systematic difference between the predicted vote share and the actual 
vote share to partially identify the fraction of strategic voters. 

Our estimation applies an estimator based on moment inequalities developed by Pakes, 
Porter, Ho and Ishii (2007). We use a bounds estimator because our voting model does not 
yield a unique outcome and we may only be able to set-identify the model parameters. 

We use data on the Japanese House of Representatives elections for estimation.” Once the 
primitives of the model have been estimated, we investigate the extent of strategic voting 
using the estimated model. In our counterfactual policy experiments, we study how the 
outcome would change under proportional representation and under the assumption that all 
voters vote sincerely. 

We find that a large proportion [75.3%, 80.3%] of voters are strategic voters. We also 
recover the extent of misaligned voting once we estimate the model, by simulating the equi- 
librium behavior. Our results show that [2.4%, 5.5%] of the voters engage in misaligned 
voting, or [3.0%, 7.3%] of the strategic voters. In our first counterfactual experiment, in 
which we introduce proportional representation, we find that the number of votes for major 
parties decreases by a large margin, and the number of seats decreases by an even greater 
margin. 

In our second counterfactual experiment, we investigate what the outcome would be if all 
voters vote sincerely under plurality rule. We find that the number of seats for the parties 
would change significantly: one party would add [17, 40] seats while another would lose [20, 
45] seats out of a total of 175 seats. Even though the extent of misaligned voting is small 
(2.4%, 5.5%], the impact on the number of seats is considerable because the winning margin 


is often small. 


Related Literature There are both an experimental and an empirical literature on 
strategic voting in elections. In small-scale laboratory experiments with three candidates 
under plurality rule, Forsythe, Myerson, Rietz, and Weber (1993, 1996) find evidence of 
strategic voting.®© They also find that strategic voting is more likely to occur if pre-election 
coordination devices such as polls and shared voting histories are available. 

There is also a large empirical literature on strategic voting (see, e.g., Alvarez and Nagler 
(2000), Blais, Nadeau, Gidengil, and Nevitte (2001) and papers cited therein). Previous 


work in the literature has attempted to identify strategic voting by comparing each voter’s 


>Our implementation does not depend on any specific institutional feature of the Japanese election. Our 
approach can be applied to any election with plurality rule or single non-transferrable voting. 

®See Holt and Smith (2005), Morton and Williams (2006), Palfrey (2006), and Rietz (2008) for a survey 
of the literature on experiments. 


actual vote to his preferences. Voter preferences are proxied by measures such as voting 
behavior in previous elections and surveys eliciting voter preferences. However, as pointed 
out earlier, the difference between voting and preferences is a measure of misaligned voting 
rather than that of strategic voting. Accordingly, our estimate of misaligned voting [2.4%, 
5.5%] is comparable to the estimates of strategic voting reported in the previous literature, 
which ranges from 3% to 17%." 

One closely related paper is Degan and Merlo (2007). They consider the falsifiability of 
sincere voting, and show that individual-level observations of voting in at least two elections 
are required to falsify sincere voting. They examine whether there exists a preference profile 
that is consistent with the observed election outcome without imposing any relationship 
between preferences and observable covariates. Our approach relates preferences to voter 
covariates within a standard discrete-choice framework. Identification of voter preferences 
and the fraction of strategic voters is then possible without requiring data on repeated 
voting records. This is analogous to papers such as Berry, Levinsohn, and Pakes (1995) 
which estimate individual preferences using aggregate data.® 

Our paper is also related to the literature on strategic voter turnout. Shachar and 
Nalebuff (1999) and Coate, Conlin, and Moro (2008) estimate a model of voter turnout in 
which voter turnout is a function of the expected closeness of the election. These papers 
study turnout focusing on two candidate elections, a setting in which the issue of strategic 
voting does not arise. Our paper focuses on the issue of strategic voting instead of strategic 
turnout, although it is conceptually straightforward to extend our approach to a model of 
elections with both strategic voting and strategic turnout. We discuss this extension at the 
end of Section 4. 

We describe the model in the next section, and explain the data in Section 3. Details 
on identification and estimation are provided in Section 4. Section 5 presents the results 
and the counterfactual experiments. Finally, we close the paper with concluding remarks in 


Section 6. 


7See Alvarez and Nagler (2000), Blais, Nadeau, Gidengil, and Nevitte (2001) and papers cited therein. 

8Regarding the use of aggregate data, the political science literature has been concerend about the issue 
of ecological inference (See, e.g., King, 1997). King (1997) proposes a solution to this problem by assuming 
a random coefficients type model with a particular functional form. Our approach can be thought of as 
microfounding the distribution of the random coefficients in his statistical model. We do so by considering 
a game theoretic model of voting. 


2 Model 


2.1 Model Set-up 


Our model is an adaptation of Myerson and Weber (1993) [hereinafter denoted as MW] and 
Myerson (2002). We model plurality-rule elections in which K candidates compete for one 
seat. Voters cast a vote for one candidate,® and the candidate receiving the highest number 
of votes is elected to office (ties are broken with equal probability). We restrict attention 
to the case when K > 8 since strategic voting is otherwise not an issue. There are M 


municipalities in an electoral district, and we use subscript m € {1,2,...,M} to denote a 


M 
municipality. There are a finite number of voters, }> N,, < oo, who are the players of 
m=1 
the game (N,, is the number of voters in municipality m). Voter n’s utility from having 


candidate k in office is 


Unk = U(Xn, Zk) ar Cid + Enk, 


where x, are voter characteristics, z, are candidate characteristics, €;,, is a candidate- 
municipality shock, such as the ability of a candidate to bring pork to municipality m, and 
Enk 1S an i.i.d. preference shock. 

We consider two types of voters, sincere (behavioral) and strategic (rational). A sincere 
voter casts his vote for the candidate he prefers most, i.e., a sincere voter votes for candidate 
k if and only if u,, > un, Vl. On the other hand, a strategic voter casts his vote taking into 
consideration that the only events in which his vote is pivotal are when the election is exactly 
tied or when the second place candidate is one vote behind. When voter n is pivotal and 
he casts the decisive vote between k and I, he changes the outcome of the election. In this 
situation, voting for candidate k gives utility 5 (Unk —Unt).'° Hence, if we let T, = {Tre }ni 
denote voter n’s beliefs that candidates k and | will be tied for first place or that k will be 
one vote behind J, the expected utility from voting for candidate k is given by!! 


Unk ee = a Test Unk = tbat) 


°We abstract from the issue of voter abstention. We discuss the issue of turnout at the end of Section 4. 

l0Voter n’s vote is pivotal in two cases. First, consider the case when candidates k and / are exactly 
tied without voter n’s vote. In this case, candidate k wins if voter n votes for k. Because ties are broken 
with equal probability for each candidate, the utility from voting for candidate k is unz~ — 5 (Unk + Uni). 
Second, consider the case when candidate k is one vote behind candidate | without voter n’s vote. The two 
candidates will tie if voter n votes for candidate k, while candidate / wins if voter n does not. Thus, the 
utility from voting for k is $ (Unk + Unt) —Unt- Therefore, in both cases, the utility from voting for candidate 
k is 4 (Unk =e Unt): 

llWe assume that voter beliefs over three-way ties are infinitesimal compared to two-way ties, as is com- 
monly assumed in the literature. 


as in MW. Strategic voters vote for candidate & if and only if tp.(T,) > tni(Tn), Vl. 
Depending on the value of T;,, strategic voters may choose to vote for any candidate other 
than the one he prefers the least (ie. the candidate k with the lowest value of un,). We 
come back to this fact when we discuss identification. 

Note that we distinguish strategic voting and misaligned voting as discussed in the Intro- 
duction. We define misaligned voting as casting a vote for a candidate other than the one the 
voter most prefers. Hence, only strategic voters engage in misaligned voting, but a strategic 
voter may or may not engage in misaligned voting. In other words, being a strategic voter 
is a necessary condition for misaligned voting, but not a sufficient condition. 

We assume that for at least some candidate pair {k,/}, beliefs over pivot probability, 
Th,kl, iS non-zero. Even if there is an obvious frontrunner, there is always some chance that a 
vote will be pivotal although it may be very small. As long as some T), x; is always non-zero, 
we can normalize T,,4; so that 50, >0).; Tn, = 1. This normalization is possible because 
a voter’s decision is determined by the relative size of U,;,(T;,), which is not affected by 
rescaling T;,,; by a constant factor. 

We denote the type of voter n in municipality m by a random variable Qnm € {0,1} 
drawn from a binomial distribution, where a, = 0 denotes the sincere voter and Qj, = 1 
denotes the strategic voter. We also let the mean of the binomial distribution to be a random 
variable drawn for each municipality from some distribution F,,. Then the probability that 


voter n in municipality m is a strategic voter can be written as 
Prien = las) =e, 


where a, is the municipality-level random term drawn from F,, and we assume that Qpm_Lanm 
Yn,n’ conditional on a,,.. The probability that the voter is sincere is Pr(Qym = O|Qm) = 
1 — Am. 


We make the following assumption on beliefs T;,, following MW. 


Assumption Beliefs over tie probabilities T,, are common across all voters in the same 
electoral district,:7.e4 1, = 1. ¥ we {lis Mi} OL. U1, ug Nigh 


This assumption simply imposes voters in the same electoral district to have common be- 
liefs over pivot probabilities, 7’. The assumption reflects the fact that information regarding 
the expected outcome of the election is widely available from news reports and poll results. 
By gaining access to this kind of information, voters in the same electoral district can form 
similar beliefs regarding the outcome. 

Let \ Bac be the fraction of votes cast by sincere voters for candidate k in municipality 


m, and let Ve ) be the fraction of votes cast by strategic voters for candidate k. Note 


that Ver" (L) is a function of beliefs, T. We can write these fractions as 


Nm 
(1 _ Onn) . 1{unk es Unl; Vi} 
Vsin = ee (1) 


Nm ; ’ 
oe 3 Qnm) 
Nm 
S- =] QAnm* 1{ting(T) z Unt), VI} 
= = 
Da nm 
n=1 
The total vote share for candidate & in municipality m is then 
Nm Nm 
240 ~ Anm) sin ae nen one 


N. km N. Vin 


Vim(T) a (T). 
Note that these expressions are approximated by their expectation as the number of 


voters, NV,,, becomes large; 


Ven > tn = // 1{Unk > Unt, Vi}|g(e)de fn(x)dx, and 
m ; 


VSI) — oft) = ff aftma(2) = tT), WI} glee fal), 


where f,, denotes the distribution of the demographic characteristics, x, in municipality m, 
and g denotes the distribution of idiosyncratic shock, é€, = (En1,.-.,Enk). We obtain these 
expressions by computing the vote share for candidate k among voters of a given demographic 
characteristics x, and then integrating this vote share with respect to characteristics x using 
its distribution f,,. We obtain a similar expression for the total vote share as N,,, becomes 
large: 

View) rs Upm(L) = (1— Om) ugin + Omvgar (T). (3) 


2.2 Solution Outcome 


Until now, our model has been the same as the one considered in MW with the only difference 
being the presence of sincere voters. In order to take the model to the data, we relax 
the consistency requirement on beliefs, 7’, that MW place in equilibrium. The equilibrium 
requirement on voters’ beliefs imposed by MW results in outcomes that may not rationalize 


diverse patterns of actual election data even when we add sincere voters to their model.!? 


!2T an election with three candidates, the original equilibrium of MW predicts that either (i) the first 
place candidate wins, and the second and third place candidates receive exactly the same number of votes 
(with corresponding beliefs {T12, T13,T23} = {p,1—p,0} for some p € [0,1]) or (ii) the third place candidate 


To better account for the variation in the data and be robust to alternative specifications 
regarding beliefs, we weaken MW’s consistency requirement on beliefs. Hence, our set of 
solution outcomes is a superset of the set of MW equilibria. 

Let us denote the district level vote share, which is the total number of votes ob- 
tained by a candidate divided by the total number of votes cast in the election, by V;, 
= ea Navi a ri ee. Nm. MW imposes the following consistency requirement in equi- 
librium: V, > Vi > Th; > eTi;, Ve € [0,1), Vk,1,7. This implies that pivot probabilities 
involving candidates with low vote shares are zero. The consistency requirement (C1) we 
impose between beliefs, 7’, and the election outcomes is a weaker version of MW’s ordering 
condition: 


C1: For an election with K candidates, 


Vi > V; => Th elas Vk,1,7 € re eer  @ a 


This condition implies that pivot probabilities involving candidates with high vote shares 
are larger than those with low vote shares. For the case of kK = 3 with vote shares Vj > V2 > 
V3, C1 implies that Tz > T13 > 73, i.e., beliefs on the pivot probability between candidates 
1 and 2, Ti, is higher than those between candidates 1 and 3, Tj3, and so on. 

Our second condition, C2, simply requires that given beliefs 7’, strategic voters vote 
optimally (and sincere voters vote for their most preferred candidate). Now we define the 


solution outcome of the voting game. 


Definition A set of solution outcomes W C A®™x (xi_,A*) is defined as the set 
W= eg Uae Sa sary such that C1 and C2 are satisfied. 


C1: >V > Thy > Ty Vh,1g € {1,..., K}. 


Nm Nm 
3S _ ~ nm) SIN Ss =1 0" orR 
C2 : Viewn = n= N Von “Ny Ves 


(T) 


A few comments are in order. First, the set of solution outcomes, W, is not empty: That 


is, a Solution outcome exists. This can be shown in a similar way as in the proof of Theorem 


receives zero votes (with beliefs {T\2,Ti3,To3} = {1,0,0}). 

Even if we (1) introduce sincere voters, (2) add shocks to voter preferences or (3) introduce randomness to 
the fraction of strategic voters (or any combination of (1), (2), and (3)) to MW, there would still only be two 
types of equilibria: One with beliefs {T12, T13,T23} = {p, 1—p,0} and the other with {Tj2, T13,T23} = {1, 0, 0}. 
As before, Equilibrium (i) has the undesirable property that the second and third candidates receive exactly 
the same number of votes. In equilibrium (ii), all three candidates can receive a positive and different number 
of votes, but the only beliefs that can support the equilibrium is {T)2,Ti3,T23} = {1,0,0}, which is a strong 
assumption to impose, unlikely to hold in many races. 
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1 in MW. The proof is in Appendix A. Second, W is not a singleton in general. In order 
to cope with the issue of multiplicity of solution outcomes, we adopt an inequality-based 
estimator in our estimation. Third, W is a superset of the set of equilibria considered in 
MW. This is because condition C1 is weaker than that of MW. Finally, note that W does 
not depend on the information structure of the model, i.e., whether we assume that the 
voters know the realization Qj. and €,, of other voters, or only their distributions. 

Finally, we remark on the empirical restriction implied by our solution outcome. Note 
that C2 embodies the restriction that no voter votes for his least preferred candidate through 
equations (1) and (2), which give the expressions for vote shares of the sincere and strategic 


voters. However, beyond this restriction, the model leaves considerable freedom in how 
VSTR 


km (L’) is linked to voter preferences. This is because the solution outcome does not pin 


down T (only a weak restriction is imposed via C1), nor do we observe the value of T. Hence, 
the empirical content of our solution outcome would be similar if we had instead adopted 


rationalizability’® as our solution concept (See Bernheim, 1984, Pearce, 1984). 


3 Data 


We use data from the Japanese House of Representatives election held on September 11, 
2005. Out of a total number of 480 Representatives, 300 members were elected by plurality 
rule. We use the data from these 300 plurality-rule elections.'* For each electoral district, the 
breakdown of vote-share data is available by municipality as shown in Figure 1. An electoral 
district is usually comprised of several municipalities (9.26 on average, in our sample).'? This 
particular data structure plays an important role in our identification. 

We obtained the data on the vote shares and candidate characteristics from Yomiuri 
Shimbun, a national newspaper publisher. The demographic characteristics we use are ob- 
tained from the Social and Demographic Statistics of Japan published by the Statistics 
Bureau of the Japanese Ministry of Internal Affairs and Communications.'® We match these 


two data sets at the municipality level. 


13T> be more precise, perfect rationalizability of Bernheim (1984) or cautious rationalizability of Pearce 
(1984). 

'4 An additional 180 Representatives were elected by proportional representation from 11 regional electoral 
districts. In proportional representation, voters cast ballots for parties, and a closed list is used to determine 
the winner. It is possible for a person to be a candidate in both plurality and proportional elections. When 
two candidates are ranked equally on the party list, the results of the plurality rule election affect the relative 
rank of the two candidates. Only the LDP and the DPJ ranked more than two candidates equally in this 
election. 

‘Tn the vast majority of cases, municipal borders do not cross electoral districts. 

‘©The basic information for the data is available at http://www.stat.go.jp/english/data/ssds/outline.htm 
and http://www.stat.go.jp/english/data/zensho/intex.html. 
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Out of a total of 300 districts, we keep the districts that satisfy the following criteria. 


(i) There are three or four candidates,'’ and the composition of the candidates’ parties 
in the district is any three or four of the following four parties; the Liberal Democratic 
Party (LDP), the Democratic Party of Japan (DPJ), the Japan Communist Party 
(JCP), or the Yusei (YUS). Technically, the Yusei is not a single party, but we grouped 
former LDP candidates who split away from the LDP and ran on a common platform 


against postal privatization. 
(ii) There are at least two municipalities within the electoral district. 


(iii) There are no mergers of municipalities within the electoral district during the 
period from April 1, 2004 to the day of the election. 


We are left with 175 electoral districts. We drop samples that do not satisfy criterion 
(i) because we treat party affiliation as a candidate characteristic, and we cannot precisely 
estimate the coefficients on parties that only fielded a very small number of candidates. 
Criterion (i) ensures that we have enough elections with the same combination of parties 
fielding candidates to construct our moment inequalities.'* We need criterion (ii) because 
our estimation requires at least two municipalities in each electoral district. Criterion (iii) 
is required to deal with an issue that arises when merging two data sets. Because the de- 
mographics data and the vote share data are collected on different dates (April 1, 2004 
and September 11, 2005), municipalities that merged with others between these dates are 
dropped from the sample. In some cases, however, we are able to match the data properly. 
When this is possible, we keep the merging municipalities in the sample. 

We report the descriptive statistics of electoral-district vote shares in Table 1. There are 
9.26 municipalities per electoral district on average. The average winner’s vote share is about 
52% and the winning margin is about 14%. The mean vote share of the winner is higher 
in three-candidate districts (52.9%) than in four-candidate districts (41.2%). The mean 
winning margin is also higher in three-candidate districts (14.2%) than in four-candidate 


districts (9.4%). Similarly, the margin between the second- and third-place candidates is 


We do not include 15 observations in which there are only two candidates for technical reasons. We use 
an estimator of Pakes, Porter, Ho and Ishii (2007) in our estimation, but it is not clear whether their method 
of inference can be applied when some of the parameters are point-identified. While two candidate districts 
contain no information about the extent of strategic voting, they point-identify some of the preference 
parameters of the voters. For our estimation, this is problematic. Alternatively, we can use other inequality 
based estimators (e.g. Chernozhkov, Hong and Tamer (2007)), which give consistent estimates even when 
parameters are point identified. However, this comes at a very high computational cost in our application. 

18The Kagoshima 5th District is dropped from the sample because no other district had the same combi- 
nation of parties fielding candidates (LDP, JCP, YUS) as this district. This is the only district we dropped 
that satisfied all three criteria. 


12 


mean st. dev. min max # obs 


# of municipalities per district 9.26 7.14 2 36 175 
3-candidate district 8.67 6.82 2 36 158 
4-candidate district 14.71 7.67 3 36 17 

winner’s vote share (%) 51.74 6.69 28.98 73.61 175 
3-candidate district 52.87 5.60 36.03 73.61 158 
4-candidate district 41.23 6.84 28.98 55.89 17 

winning margin (%) 13.71 10.15 0.06 53.91 175 
3-candidate district 14.17 10.09 0.16 53.91 158 
4-candidate district 9.40 9.71 0.06 35.50 ire 

margin between 2nd and 3rd (%) 28.47 9.46 0.57 23.32 175 
3-candidate district 30.37 TAD. 4:74 -43.32 158 
4-candidate district 10.71 8.04 0.57 23.32 17 

vote share — JCP 7.74 3.00 2.77 23.30 170 

vote share — DPJ 38.37 8.82 10.78 60.10 175 

vote share — LDP 49.71 8.89 22.00 73.62 175 

vote share — YUS 35.02 8.87 14.50 49.58 22 


Table 1: Descriptive Statistics of Electoral Districts — Vote Shares 


significantly lower in four-candidate districts than in three-candidate districts. The last four 
rows report the vote-share breakdown for the four political parties. The mean vote share of 
the LDP is 49.7%, the highest among all parties. It is followed by the DPJ with 38.4%, the 
YUS with 35.0% and the JCP with 7.7%.'° 

Table 2 reports the descriptive statistics of candidate characteristics. The first three rows 
contain information on the candidates’ hometowns.”° The next three rows provide descriptive 
statistics on the candidates’ political experience. An average of 1.32 (in three-candidate 
districts) and 1.47 (in four-candidate districts) candidates are incumbents. Note that the 
number of incumbents is higher than 1 because some candidates who had previously been 
elected to the House of Representatives in a proportional-rule election ran in the plurality 
election. Less than 0.51 candidates on average have previously held public office.”! 

Table 3 reports the descriptive statistics of the municipalities’ demographic characteris- 


tics. The mean income per capita is about 3.16 million yen (about $35,000), and the mean 


‘9Note that the sum of these percentages is greater than 100%. This is because not all parties field 
candidates in every district. 

20Tn case a candidate has a hometown in his/her electoral district (as reported in the first row), we have 
additional information on candidates’ hometowns that identifies exactly which municipality the candidate’s 
hometown is in. We do not report it here, but use it in our estimation. 

2 This includes former and current municipality councillors, mayors, members of a prefectural assembly, 
prefectural governors, and the Members of the Houses of Councillors, as well as former Members of the House 
of Representatives. 


13 


3 cand. A cand. 
district district 


# of candidates w/ hometown in district ice) Acs 
# of candidates w/ hometown in prefecture ea) es 
# of candidates w/ hometown in another pref. = aos 
## of incumbents ne Gen 
# of candidates who previously held public office ee co 
# of candidates with no exp. in public office en nes 
# of observations 158 uf 


Table 2: Descriptive Statistics of Electoral Districts - Candidate Characteristics. The mean 
of each variable is reported. Standard errors are in parenthesis. 


mean st. dev. min max # obs 


income per capita (in million yen) 3.16 O42" D227 «CAT 1,621 
years of schooling < 11 years (%) 35.00 12.37 7.16 71.08 1,621 
12-14 years (%) 45.41 6.37 20.09 62.59 1,621 
15-16 years (%) 9.83 3.34 2.86 19.41 1,621 
> 16 years (%) 9.76 5.86 1.51 39.38 1,621 
population above age 65 (%) 22.45 7.16 8.06 49.71 1,621 


Table 3: Descriptive Statistics of Municipalities 


length of schooling is about 12 years on average. The mean fraction of the population above 
age 65 is 22.5 percent. In the estimation, we use the distribution of demographic character- 
istics, which is readily available for years of schooling and age. Regarding income, only the 
mean of the distribution was available at the municipality level. We use the prefectural Gini 


coefficients as well as the average income to construct the distribution.” 


22,We have data on the total taxable income and the total number of taxpayers for each municipality. The 
mean income for each municipality can be computed from these numbers. We compute the quantiles of the 
income distribution by assuming a log-normal distribution where the variance is calculated by fitting the 
prefecture-level income distribution. Data on the prefecture-level income distritubtion is obtained from the 
2004 National Survey of Family Income and Expenditure published by the Statistics Bureau of the Japanese 
Ministry of Internal Affairs and Communications. 
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4 Identification and Estimation 


We first describe the econometric specification of the model we have presented in Section 2 
in order to facilitate our identification and estimation arguments. Then, we discuss identifi- 


cation of the model and estimation. 


4.1 Specification 


We specify the utility function of voter n in municipality m with candidate k elected to office 
as 


PREF 
thei Rays Ba 0 ) StS Sak 


where €,,,, is ani.i.d. idiosyncratic candidate-municipality level shock which follows a normal 
distribution, N (0, @¢), denoted as Fe, and €,,, is an i.i.d. idiosyncratic voter-candidate level 
shock which follows a Type-I extreme value distribution. An example of €,,,, is the candidate’s 
ability to bring pork spending to municipality m. 0°”""' is a vector of preference parameters. 
x, denotes the characteristics of voter n, including years of education, income level, and an 
indicator of whether or not the voter is above age 65. Zkm = {zP° aay is a vector 
of observable attributes of candidate k in municipality m. We partition zp, depending on 
how it interacts with voter characteristics. Let z?°° be the attributes of candidate k which 


L 


are related to his ideological position such as his party affiliation. Let Zeuee be other non- 


ideological attributes of candidate k such as the candidate’s previous political experience 
and an indicator of whether municipality m is the candidate’s hometown (which is why zgm 


gra) 


is indexed by m). As for u(Xp, Zem; , we assume a functional form with a quadratic 


loss term in the distance between the voter’s and the candidate’s ideological positions: 


.QPREF\ _ ID POS, POS\2 LTY ,QLTY 
U(Xn, Zkm3 9 ) = —(07 Px, — POP ZPOS)2 4 Q@hTV ery 
where OP R2F — ee , 078 g2'TY\ We consider a unidimensional ideological space, and 


let the ideology of the voter be a function of his demographics, 6/”x,,, and the ideology of 
the candidate be 0” lad ai OS. The utility of the voter depends on the distance between his 
ideology, 6/?x,,, and that of the candidate, 6” AP ge OS which is captured by the quadratic 
term. The additive term captures the non-ideological component of utility, which we write 
fe gQLTY gQLTY | 

As described in the model section, the objective of a sincere voter is to vote for candidate 


k, who gives the highest value of tym, while the objective of a strategic voter is to vote for 
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candidate k, who gives the highest value of tinm:(T'), where 


Um. =, >, Di tem =e) 
le{1,..,A} 
As we discussed in Section 2, we assume that for at least some candidate pair {k,l}, Thi 
is positive, no matter how small. This allows us to normalize T so that 5°, >0).;, Tm = 1, 
because utility representation is invariant to multiplication by a constant factor. 

Recall that we denote the type of voter n in municipality m by a random variable Qj, € 
{0,1} drawn from a binomial distribution, where a, = 0 denotes the sincere voter and 
Qnm = 1 denotes the strategic voter. We also let the mean of the binomial distribution 
be a random variable drawn for each municipality from some distribution F,. Then the 


probability that voter n in municipality m is a strategic voter can be written as 
Praag = 1G) Oe, 


where @,,, is the municipality-level random term drawn from a Beta distribution, Beta(0a1, 902), 
denoted as fy. 


4.2 Identification 


In this subsection, we discuss the identification of the model when we let the number of 
districts (denoted as D) go to infinity. As described in the Data Section, our election data 
includes observations from many districts, for each of which we have a municipality-level 
breakdown of vote-share data and demographic characteristics. In terms of our notation, 
the number of districts is large (D — oo), but the number of municipalities per electoral 
district, denoted by M%, is small (M¢ < oo, Vd € {1,..., D}). We assume that voting games 
(i.e., elections) are played in D districts independently of each other, and we treat each 
district as a unit of observation. 

Our identification argument proceeds in two steps. We first discuss partial identification 
of preference parameters. Then, given partial identification of preference parameters, we 


discuss partial identification of the fraction of strategic voters. 


4.2.1 Partial Identification of Preference Parameters 


Preference parameters are (partially) identified by the relationship between demographic and 
vote-share variation within each electoral district that we observe in the data. In order to 


exploit this variation for identification of preference parameters, the main restriction we use 


16 


is that voters do not vote for their least preferred candidate. This restriction, however, does 
not give us point identification: The restriction only implies whom a voter will not vote for, 
but it does not imply whom a voter will vote for. The question of whom a voter will vote 
for is determined by T@ (beliefs over tie probabilities in district d) which is not observable 
to the econometrician nor uniquely determined by our solution concept. 

If T? were known (either observed or uniquely pinned down by our solution concept), the 
data would point-identify the preference parameters. As T@ is neither uniquely determined 
nor observable, the identified set of preference parameters is the union of parameter values, 
each corresponding to a value of T¢ € A*“?, Notice how our consistency requirement on the 
beliefs C1 constrains the identified set by putting a restriction on the set of values T? can 
take. Without C1, T¢ can take any value as long as it adds up to one. Below, we illustrate 
how parameters are partially identified (and not point identified). 

Consider the preference parameter, °°”, which captures the effect of age (being 65 or 
in the 
same district, one with a high proportion of voters above age 65 (say 30%) and the other 


older) on ideology, for the case of K = 3. Take two municipalities ({mti4, Moung}) 
with a low proportion (say 20%), but otherwise with similar demographic characteristics. 
Now take pairs of municipalities in other districts ({m3jq, Meoung}s {Moa Maoung}> «)» that 
have similar demographic characteristics as the first pair (i.e., one with 30% of voters above 
age 65 and the other with 20%. Other demographic characteristics are similar to the first 
pair). It is important to note that each pair belongs to the same district and that we have 
many such pairs (from many different districts). Suppose that the vote share for Party A in 
the “old” municipalities are 5% higher than in the “young” municipalities on average. Then 
this implies that being older makes the voters become ideologically closer to Party A: But 
how much closer depends on the beliefs T?. 

In order to exhibit how T% affects identification of 67°’, consider two polar cases as in 


Figure 2: 


Case 1: T¢ is such that the tie probability between candidates from Parties B and 
C is close to one, and that the other two tie probabilities are close to zero, for all d 
(Tg © 1, Tip © Tic © 0, Wd). 

Case 2: T¢ is such that the tie probability between candidates from Parties A and 


C is close to zero, and that the other two tie probabilities are near 0.5, for all d 
(Th © Tao © 0.5, Thc & 0, Vd). 


In Case 1, no strategic voter votes for the Party A candidate; hence, the 5% increase in 
the vote shares of Party A candidates in the “old” municipalities must be attributed to the 


difference in the sincere voters’ behavior alone. Because the 5% increase must be explained 


dst 


-N O-eN -N -N -N O-N 
Case (a) cabitacs (B){BABC) (cores) 
a “ae = “er ~ ee 


Tap=Tac=9, Tgc=1 


“young” municipalities A B Cc 
“old” municipalities A B C 
rN | “NO | rN FN 
Case2 (A) (AB AE) (8)(BAK Be) (c)corncss 


“young” municipalities 


“old” municipalities A B Cc 


Figure 2: Identification of Preference. Sincere voters are illustrated with regular circles, and 
strategic voters, with dotted circles. The letters inside the circle indicate the most-preferred 
candidate and the superscripts indicate the second preferred candidate for strategic voters. 
The rectangles indicate the respective vote shares. In Case 1, only sincere voters who prefer 
candidate A the most vote for A. In Case 2, both sincere and strategic voters who prefer 
candidate A the most vote for A. In Case 1, the 5% difference in the vote share is then 
attributed to the difference in the behavior of only sincere voters, while it is attributed 
to the difference in the behavior of both types of voters in Case 2. Thus, the effect of 
demographic characteristics on utility depends on T’. 


only by the fraction of the population that is sincere, the effect of the parameter 6%” must 
be quite large. In Case 2, the votes for Party A candidates come not only from sincere voters, 
but also from strategic voters. The 5% increase in the vote share for Party A candidates can 
then be accounted for by the difference in the behavior of both sincere and strategic voters. 


grbovess 


Thus, compared to Case 1, the value of will be relatively small in Case 2 because we 


can attribute the 5% increase to the difference in the behavior of both types of voters. As 
T¢ is unobservable, we cannot rule out Case 1 nor Case 2. Thus the identified set for 9° 
will be a set that includes the values implied by Case land by Case 2. 

The parameters on candidate characteristics, 0”°° and 6°"7” , can similarly be (partially) 
identified by taking municipalities across districts and relating the variation in the vote share 
and candidate characteristics. For example, the effect on utility of electing a candidate with 
no experience is identified by the difference in the vote shares between candidates with 
no experience and those with experience, controlling for other candidate and demographic 


characteristics. 
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A,2.2 Partial Identification of the Fraction of Strategic Voters 


Second, we discuss the identification of the average fraction of strategic voters. In the 


gP REF and consider the identification 


following discussion, we fix the preference parameters, 
of the extent of strategic voting given 07””"", Once this is accomplished, we can vary 07?"" 
in the identified set of 0°”“" to trace out the identified set of the parameters that determine 
the extent of strategic voting.”° 

Intuitively, the variation in the data that we would like to exploit is the variation in the 
voting outcome among municipalities (in different districts) with similar characteristics vis- 
a-vis the variation in the vote shares and characteristics of other municipalities in the same 
district. For example, consider two districts, one that is generally conservative and another 
that is liberal. Suppose that we can find a liberal municipality from each district. Suppose 
also that there are three candidates, a liberal, a centrist and a conservative candidate in 
both districts. If there are no strategic voters, we would not expect the voting outcome to 
differ across the two municipalities. However, in the presence of strategic voters, the voting 
outcome in these two municipalities could differ. If the strategic voters of the municipality 
in the conservative district believe that the liberal candidate has little chance of winning, 
those voters would vote for the centrist candidate, while the strategic voters in the other 
municipality (in the liberal district) would vote for the liberal candidate according to their 
preferences (if they believe that the liberal candidate has a high chance of winning). 

More generally, given the preference parameters, the model can predict what the vote 
share would be in each municipality if all of the voters voted according to their preferences. If 
there were no strategic voters, the difference between the actual outcome and the predicted 
sincere-voting outcome would only be due to random shocks. However, when there is a 
large number of strategic voters, the actual vote share can systematically diverge from the 


predicted outcome. This is due to the multiplicity of solution outcomes induced by strategic 


*3Our two-step identification strategy can be schematically described as follows. Let OP" and @° be 
the parameter spaces for 0°”"" and 6° (= {6o1,002, 9¢}). First, we consider [,(0%) C OPP, the identified 
set of 0° "*"' | given that we may allow 6° to take any value in O°. We then consider Ip(I;(O°%)) C 0%, 
the identified set of 6° given that we allow 6°" to take any value in [,(@%). We do not know whether 
In(I,(0%)) & 1n(OP2¥*), but the important fact is that Iz(I,(O%)) G O®. This would be the case if 5 6%, 
A OPREP © @PREF guch that I,(0°""") = 6°. Here, we illustrate this point by example. Let 6° and 6°? 
be such that 641/(6a1 + 902) ¥ 0. In this case, almost every voter votes according to his preferences. Thus, 
we would not expect the vote share of a municipality to be correlated with the demographic characteristics 
of other municipalities within the same electoral district. But it could well be the case that voting behavior 
in a very liberal municipality in a generally conservative electoral district is systematically different from 
the voting behavior in a very liberal municpality in a generally liberal district. There are no preference 
parameters that can rationalize such data patterns. Thus, I2(11(0%)) ¢ 0%. 

Our two-step procedure has empirical content because preferences are partly identified by demographic 
and vote-share variation within districts, while the parameters concerning the distribution of a@ are identified 
by variation across districts. 
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voters. Recall that strategic voters make voting decisions conditional on the event that 
their votes are pivotal. If the beliefs regarding the probability of being pivotal differ across 
electoral districts — and we have no reason to believe that they do not — the behavior of 
strategic voters will also differ across districts. This corresponds to different outcomes being 
played in different districts. The example in the previous paragraph is a manifestation of 
this. We use the difference between the predicted vote share and the actual vote share to 
partially identify the fraction of strategic voters. 

To further illustrate our identification argument, consider the case of three candidates. 
In this case, the vote shares in municipality m can be drawn as a point in a simplex. Recall 
that given a particular value of a,, (the fraction of strategic voters in municipality m) and 
T, the vote shares can be written as a convex combination of the vote shares of sincere and 
strategic voters; 


Vmn(T, Am) = 1 - Qm)volN + Ome (P). 


where v,,, is the vector of vote shares of the three candidates (Vim, Vam,UV3m) and similarly 


SIN 


SIN and v2!®, Notice that here, we have made the dependence of vm on dm explicit. 


for v 
Now define A,,(a,) as the set of all possible vote shares when we vary T in Y (We denote 


the set of T satisfying C1 by T), 


ea Gira) =| Sea Leia). 


Tey 


Note that A,,(@m) and A,,(1) are similar, by a factor of a,, around the singleton A,,(0) = 
v2!" because a, is the weight of the convex combination. The dotted circle in Figure 3 
corresponds to A,,(1). 

For expositional purposes, we first present our identification argument when we can take 
the number of municipalities to go to infinity and the municipality level shock €,,, is close to 
zero. Consider a subset of municipalities in a single electoral district which all have the same 
demographic characteristics (Note that this does not literally have to be the case because we 
can control for demographic characteristics once preference parameters are known). In this 


case, the vote share observations should all lie on the line segment between A,,,(0) = v2/% 


STR 


5TR(T) because these two endpoints are the same in all municipalities™‘ and only the 


and v 
realizations of a, vary across municipalities. Denote this support of the observed distribution 
as L and the endpoint of L as L (the other endpoint is v?! = A,,(0)). We also define the 


point L’ where the extension of L intersects the boundary A,,(1) (See Figure 4). Note that 


4To see this, recall that A,,(0) is a function of demographic characteristics, and v3! "(T) is a function of 


demographic characteristics and T. As the municipalities belong to the same district they share the same T’ 
and they share the same demographic characteristics because of the way in which we selected them. 
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Figure 3: Vote Shares for the Case of N = 3. Vote shares v,,(7, a) is a mixture of sincere 


votes (A,,(0) = v2") with fraction 1 — a, and strategic votes (v°?"(T)) with fraction a. 


L, A,,(0), and L’ are all identified. L is just the support of the observed vote shares, and 


A,,(0) and L’ are identified once preferences are identified. On the other hand, the exact 
STR 


>» '(T’) cannot be determined as T is unknown. The only thing that we know 


position of v 
about its location is that it lies somewhere on the dashed line segment between L and L’.?° 
Consider two polar cases, Case A and Case B in Figure 4. Case A depicts the situation 


STR(T) is at L and Case B depicts the situation where v37"(T) is at L'. For each of 


where v7, m 


the two cases, observations of vote shares can be mapped into realizations of a,,, €[0,1]. This 
mapping is different in Case A and Case B and results in different distributions of a as can 
be seen in Figure 4. Case A corresponds to the upper bound of the extent of strategic voting, 
and Case B provides the lower bound. We therefore can partially identify the distribution of 
Qm as well as the upper and lower bounds of its mean. 

Now we discuss how we can modify this discussion to the case where the number of 
municipalities are finite but the number of districts goes to infinity. Parallel to the previous 
argument, consider subsets of municipalities from each district with the same demographic 


characteristics. The key differences from the previous situation are that (1) even if we 


STR 


>’ (T) differs across districts because T’ is not the 


condition on the same demographics, v 
same across districts, and (2) we can only take a finite number of municipalities from the same 
district. Figure 5 illustrates the case where we have three municipalities from two districts. 
Notice that A,,(0) is the same across these municipalities because the demographics are 


the same. However, as municipalities in different districts have different T?, the vote share 


*5This is because vote shares are given by Vm(T,Qm) = (1 — am)v3!% + amv! *(T), so that any point 


on L must lie between A,,(0) and v2?"(T, am). 
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0 1 0 1 


Figure 4: Partial Identification of the Extent of Strategic Voting When D = 1 and M¢ > ov. 
Vote share observations map differently into different values of a depending on the position 
of vS?®(T). Case A corresponds to the upper bound of the distribuion, and Case B to the 


lower bound. 


data will be on different line segments for different districts. As in the previous argument, 
consider two polar cases, Case A’ and Case B’ in Figure 5. Case A’ is the situation where 
vTR(T) is at L., and Case B’ corresponds to the situation where v2"(T) is at Li,. For 
each of the two cases, we can map the vote share observations into realization of a, € [0, 1]. 
Note that even though the number of municipalities in a given district is finite, by taking 
the number of districts to infinity, we can obtain an infinite number of a,,s on [0,1] that 
are transformed from the vote share observations. Note that Case A’ gives the upper bound 
of the distribution of a,,, and Case B’ gives the lower bound. Thus, we set-identify the 
distribution of aj. 

In the actual data, the vote shares may not lie on the same line segment as in Figure 
5, even when we take observations from municipalities with the same demographics. Recall 
that €,,, is the municipality level shock that accounts for this kind of variation. It is true that 
if we do not restrict the distribution of €,, in any way, it may not be possible to separately 
identify the distribution of €,, and a, nonparametrically. However, it should be intuitive 
from Figure 5 that if restrict the distribution of €,, to well-behaved distributions which are 
mean-zero and unimodal, the same intuition would carry through. We assume that the 
distribution of the random shock €,, follows a Normal distribution with mean zero. Then, 


we can parametrically account for the dispersion of vote shares around the line segment and 
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Figure 5: Partial Identification of the Extent of Strategic Voting When D — oo, but M4 < 
oo. The figure illustrates the situation when there are two districts with three municipalities 
each. Case A’ corresponds to the upper bound, and Case B’ to the lower bound. 


the above identification discussion remains valid. 
Finally we describe how to extend our argument when preference parameters are only 


gPREF in the identified set, we can partially identify the extent 


partially identified. For each 
of strategic voting by following our previous argument. To the extent that preference para- 
meters are only partially identified, we can vary 0/""" in the identified set: This allows us 


to trace out the identified set of the extent of strategic voting. 


4.3 Estimation 


At the outset, it is useful to clarify the set of parameters that we estimate: They are the 
preference parameters, 07", the distribution of strategic voters, (041, 92), and the variance 
of €, @¢. It is important to note that we do not estimate the beliefs 7’. This is because our unit 
of observation is the district, and as the number of districts increases, so does the number 
of tie beliefs 7’. Because we cannot treat 7’ as parameters, we need restrictions that do not 
involve T’. 

We estimate the model using an inequality-based estimator developed by Pakes, Porter, 


Ho, and Ishii (2007). If voter beliefs, 7’, were known (either observed, or uniquely determined 


23 


by the model), a single outcome would correspond to one realization of the unobserved error 
terms (€,a). In such a case, we could employ estimation procedures such as GMM or MLE. 
However, as discussed in Section 4.2, the multiplicity of outcomes induced by the presence 
of strategic voters, together with the fact that we cannot observe voter beliefs, 7’, imply that 
the model parameters are only partially identified: This makes the use of set-based estimator 
appropriate. 

We construct the moment inequalities using an idea which is somewhat similar to indi- 
rect inference (Smith (1993) and Gouriéroux, Monfort, and Renault (1993)). The following 
explains the steps we take to construct the moment inequalities. A more detailed description 


of each step is found in Appendix B. 


1. Take some district d and denote the municipalities that belong to this district as 
{1,2,...M“}. Regress the vote share data of candidate k in each municipality, nee 
on the demographics of each municipality, fm,7° to obtain the regression coefficient 

ae = ff) yved*, where v@@i? = ee acre and fa = (fi,...fia)’. Note 


that we obtain K coefficients for each district. 


2. Fix some parameter @ and beliefs of voters, T?. Also fix particular values of ag = 
oe ale and €,; = {é,}Me, which are the fractions of strategic voters and the 
candidate-municipality shocks, respectively. Given 6, T’, ag and €,, compute the pre- 
dicted vote share outcome for each municipality in the district, (up RED ET One & 30): 


eer) te Ome, E usa Q)). 


3. Parallel to step 1, regress the simulated vote share, Uk. RED(T4 Om, Em; 9), on the de- 
mographic characteristics in each municipality, f,,, to obtain the regression coefficient 
BralT*, a, €a59) = (fafa) * fog (1%), where ug ®#?(T*) = (ug tT"? (1%, an, €1; 9), 


sus UPREP (T4, cxyya, €y4038))" 


4. Because we do not know T%, we vary T¢ € &(v%*) to obtain the minimum and 


maximum values of the regression coefficients as 


Oe 20): = min T", aa, €4;0), and 

Bah d Ey ) T4ET(ydata) Bal d Ei ) 

Bral@aég) = max (8,47 “ aca, E459), 
TET (viere) 


?6We used f,, to denote the distribution of demographic characteristics x in municipality m in Section 
2. If we discretize fm, we can identify f,, with a vector of probabilities. We use the same notation fm to 
denote the distribution and the vector of probabilities. The vector f,,, contains, for example, the fraction of 
the population above 65, the fraction of population in different income ranges, etc. 
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m=1 VEm 


T(v92) is defined as the set of belief that is consistent with condition Cl. Recall 


that C1 requires that beliefs be consistent with the vote share outcome. 


where v@#4 = oF MS vdata NY, a ae Nn) is the district level vote share data and 


, ae out a, and €, by simulating values of ag and €, from Fy and F¢, and obtain 


Bal = ff Bial Od, 4; 9)dFadFe and Bea Ss Peal Qa, Eg; 0) dF ad. 
. Then, by construction, we have E[6, (40)| < E| i] < E[B;,,q(80)] at the true para- 
meter 49. Thus, we obtain the following moment inequalities; 


E(B, ,(90) — Bea] < 0, and 


ees) 


E(By,a(90) — Bea’) = 0. 


Moreover, we can construct moment inequalities conditioning on candidate character- 
istics z (z only takes discrete values).?” We can do so by running the regressions in 
steps 1 and 3 only on a subset of the sample for which candidate characteristics z takes 


a particular value: 


B[B,,,(00) — 6%%*\z] < 0, and 
E(B a(0) — BH4|z] > 0. 


IA 


The identified set is the set of 0 that satisfy the above equations. 


We base our estimation on the conditional moment inequalities. We take the sample 
analog of the conditional moment inequalities by repeating steps 1 through 5 for each 


district. Then, by taking the average, we obtain the criterion function 


i 4) ata 
Q*(0) = Sid Meno Fal®) — Bea" || 
¢,k d _ 
CO = D5 Deo [oes] | 
¢,k + 
where ||a||, = max{0,a}, and ||a||_ = min{0,a}. We then apply Pakes, Porter, Ho, 


and Ishii (2007). 


Note that in computing the predicted vote shares in Step 3, we use v;,.,(Z") in equation 


. Ukm(TI) is the infinite counterpart of the vote share V;,,,(Z) in equation (3); that is, 


27 ~ only includes variables such as indicators for party affiliation and hometown as described in Section 
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the probability limit of V;,.(7) when the number of voters tends to infinity. Of course, the 
number of voters in each municipality is finite,?* but this is not a problem as long as the error 
from approximating the vote share by its infinite counterpart is sufficiently small compared 


to the variance of other error terms in the model. 


Extending the Model to Include Voter Turnout Our approach can be extended 
to include the voter’s turnout decision. We can, for example, introduce a cost of voting (or 
a consumption value of voting) into our model, and allow the voters to abstain. In terms of 
the standard discrete choice model, this would be analogous to the inclusion of an outside 
option (e.g., not buying a good). Of course, with this modification, we would no longer 
be able to normalize T to sum up to 1 (ie., 50, >0.., Zh = 1) as we do in our paper. 
The scale of T’ matters for turnout. However, it should be straightforward in principle to 
identify and estimate a model with voter turnout. The scale of T’ would be identified by the 
level of turnout. Then, the identification of the model parameters would follow similarly as 
the discussion in Section 4.2. Estimation would proceed by simulating the vote shares and 
turnout for all possible values of T including those that do not add up to 1. 

In this paper, we only focus on the issue of strategic voting for computational reasons. In 
the standard pivotal voter model, turnout is sensitive to small changes in JT’. For example, 
a change in T from 10~!! to 10~!° increases the voter’s utility of turning out by ten-fold. 
This means that we would need to simulate the outcome on a grid in the space of pivot 
probability that is fine enough to clearly differentiate values 10~'!, 10~'° (and in between). 


Hence, the computational cost of implementing this approach could be very high. 


5 Results and Counterfactual Experiments 


5.1 Parameter Estimates 


The confidence intervals for the parameters are reported in Table 4. The exact specification 


of the utility function we estimate is 


gree) as 


UX Zkm} 


const pincome peducation pabove65 pbelow65 LDP pjJCP j~DPJ jYUS),,POS\2 
=f (gros gence. 9 Beene pPrsonOnl = (pe OR aBIOr gD Pe oie FOS) 
incumbent previous pno_experience phometownl phometown2 phometown3 phometown4,, QLTY 
noe [0 ’ 0 ’ 0 ’ 0 ’ 0 ’ 0 ’ 0 | km 


Tei + Ekn; 


?8The average number of voters in a municipality is more than 43,000. 
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Confidence 


Interval 
ee 5.210, 6.005 
Os 1.473, 1.706 
Oe 0.373, 0.385 
phometownl 0.437, 0.444 
ghometown2 0.180, 0.187 
phometown3 0.038, 0.041 
gconst 1,490, —1418 
pincome — 0.164, —0.162 
peducation (0.177, 0.179 
gabovess — 0.003, —0.001 
gyvus — 0.068, —0.065 
gJcP — 3.467, —3.448 
gpPs — 2.998, —2.990 
gprevious = 0.204, —0.199 
gro_experiecne [(0.080, 0.083 


Table 4: Confidence Intervals. Confidence intervals reported are asymptotically more con- 
servative than 95%. These confidence intervals are calculated following Pakes, Porter, Ho, 
and Ishii (2007). 


where we use normalizations 6°" —9, gincumbert_ ghometownd _g and g2PP—9.?9 

First, we discuss our parameter estimates for the first term of the utility function. This 
term captures the ideological component of the voter’s utility and it is written as a function of 
the distance between the voter’s ideological position and the candidates’ ideological positions. 
We have estimated the ideological positions of the candidates’ parties as, 07°" =|-3.467, - 
3.448], 0??/=[-2.998, -2.990], and 6° "°=[-0.068,-0.065], where 64?" = 0, by normalization. 
We can interpret this result as follows. The JCP and the DPJ are close in ideological space 
relative to the position of the LDP and the YUS, but compared with the JCP, the position 
of the DPJ is slightly closer to the LDP and the YUS. This is consistent with the general 
understanding that on the left-right spectrum, the JCP is very liberal, the DPJ is moderately 


liberal, and the LDP and the YUS are moderately conservative. Regarding voter positions, 


°Tf we let the first three elements of the vector geeky be dummy variables for whether (1) candidate k 


has been an incumbent, (2) has had previous political experience, or (3) has had no political experience, 
then the first three elements of no add up to 1: ge a pee (2) ae eae = 1. Thus we need to 
normalize one of the coefficients (The fact that we are dealing with a discrete choice model precludes us from 


: Bel hometown4 . 
including a constant term as well.). For the same reason, 0°”°’®? and 9/°™@°""4 are normalized to 0. As 


for 9%? , this is normalized to 0 because only the difference between the candidate’s ideology, 9?O°2P 0%, 
and the voter’s ideology, 6/x,, matter. Note that because we include a constant term in z?%, one of the 


elements in @/” can be normalized to zero. 
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a voter with a lower income, a longer years of schooling, and younger than 65 is ideologically 
closer to candidates from the LDP and the YUS than to candidates from the DPJ and the 
JCP. 

The estimates of the parameters on candidate experience are 6?"°’°"*=[-0.204, -0.199], 
and 9”°—e*Perence__ 1.) 080, 0.083], where ’™"""’e"" — 0, by normalization. 6?"°"°"* measures 


Qro—eePemence’ Measures the effect of 


the effect of previously having held public office and 
not having had any experience in public office. We have estimated 6?"°°"* to be [-0.204, - 
0.199], which means that incumbents have an advantage over non-incumbent candidates with 
previous political experience. We have estimated 6"°-“?"" to be [0.080, 0.083], which 
implies that candidates with no prior experience do slightly better than incumbents. This 
may seem somewhat surprising, but the biggest issue in this election was about postal reform, 
pitting old guard politicians against new challengers. Our result can be interpreted as voters 
preferring fresh candidates to both incumbents and candidates with previous experience. 
Hometown effects are estimated as 6°’ —[0.437, 0.444], 9’™e'e’”? —[0.180, 0.187], 
and 9”emeto"n3__1.038, 0.041], where 0°74 —0, by normalization. The parameter 6”°"°""! 
captures the effect of having a hometown in the same municipality as the voter, and 0’? 
is the effect of having a hometown in the same electoral district but in a different munici- 
pality. 9?™e""3 is the effect of having a hometown in the same prefecture as the voter but 
not in the same electoral district, and lastly, 6’7"%°’"" = 0 is the effect of having a home- 
town in a different prefecture. The results show that voters receive the highest utility from 
a candidate whose hometown is in the same municipality as theirs, and the utility decreases 
as the distance between the candidate’s hometown and the voters’ municipality increases. 
Finally, the mean of the distribution of strategic voters (0a1/(0a1 + 9a2)) is estimated to 
be between 0.753 and 0.803, that is, [75.3%, 80.3%] of voters are strategic voters on average. 
This may sound surprising given the fact that the fraction of strategic voting reported in 
previous studies is between 3% and 17%. However, note that the fraction of “strategic voting” 
reported in previous studies is in fact the fraction of misaligned voting, as discussed in the 
Introduction, and not the standard definition of strategic voting (See, e.g., the entry of 
“strategic voting” in The New Palgrave Dictionary of Economics by Feddersen (2008).). 
Misaligned voting is an equilibrium behavior of strategic voters, and strategic voters may 
or may not vote for their most preferred candidate. In order to compare our result with the 
previous studies, we use the estimated model to compute the extent of misaligned voting in 


the next subsection. 
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5.2 Extent of Misaligned Voting 


The extent of misaligned voting is given by the fraction of voters who do not vote for the 
most preferred candidate. Because we do not have any individual voting records (we only 
observe vote shares at the municipality level), we still face the task of identifying the extent 
of misaligned voting from aggregate data; that is, from the difference in the actual vote 
shares and the counterfactual vote shares we simulate using the estimated model, under the 
assumption that all voters vote sincerely. Identifying the extent of misaligned voting is not 
straightforward because there could be misaligned voting at the individual level, but the 
inflow of misaligned votes to candidate k (i.e., votes cast for candidate k by voters who do 
not prefer k the most) and the outflow of misaligned votes from candidate k may cancel 
each other out in the aggregate at the municipality level. Additionally, computing what the 
outcome would have been if all voters voted sincerely is itself not a simple task. This is 
because (1) the realization of municipality level shocks (€) cannot be uniquely recovered and 
(2) the model parameters are set identified. We describe how to deal with these issues in 
Appendix C. 

We obtained the upper and lower bounds of misaligned voting as 2.4% and 5.5%, that 
is, about [2.4%, 5.5%] of all voters voted for a candidate that they did not prefer most. 
Our estimates of misaligned voting are comparable to the numbers reported in the existing 
literature, ranging from 3% to 17%. Also, given that the estimated fraction of strategic 
voters is about [75.3%, 80.3%] of the population on average, the fraction of strategic voters 
who did not vote for their most preferred candidate is [3.0%, 7.3%]. 


5.3 Counterfactual Experiments 
5.3.1 Proportional Representation 


In our first counterfactual experiment, we consider what the election outcome would have 
been under proportional representation instead of plurality rule. In a typical election under 
proportional representation, voters cast ballots for parties rather than for individual candi- 
dates and parties are allotted seats in proportion to the vote share. As votes would not be 
wasted under proportional representation, there is little incentive for voters to vote strate- 
gically. Thus, minor parties generally gain more votes and seats than they would under 
plurality rule. 

We computed the counterfactual vote share by assuming that all voters vote for the party 


whose ideological position is closest to their own.*? We also allowed the voters to vote for 


3°We only used the party position to compute the counterfactual outcome because candidate-specific 
characteristics do not play role in proportional representation. 
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JCP DPJ LDP YUS 


Actual (Plurality) 


Vote Share (%) 7.8 38.4 50.0 3.9 

Number of Seats 0 35 131 9 
Counterfactual (PR) 

Vote Share (%) (7.40, 8.29] (31.82, 33.55] [26.56, 27.50] 31.87, 33.02] 


Number of Seats _[13.61, 14.43] (55.69, 58.72] [46.47, 48.13] [55.77, 57.79] 
Number of Seats is calculated as (vote share) x 175. 


Table 5: Counterfactual Experiment — Proportional Representation. Acutual vote share is 
computed by aggregating the number of votes for a party across all of the 175 districts and 
dividing it by the total number of votes cast in the 175 districts. Thus they add up to 100% 
(c.f., Table 6). 


any of the four parties regardless of whether a party actually fielded a candidate in the 
voter’s district or not. Hence, there are two effects that account for the difference in the 
vote shares between the actual election and the counterfactual experiment. One effect is the 
change in the behavior of strategic voters (sincere-voting effect). The second is the effect 
of expanding the choice set (choice-expansion effect). The second effect is present because 
in the counterfactual experiment, we let the voters vote for parties regardless of whether a 
party fielded a candidate in the voter’s district. In our next counterfactual experiment, we 
will try to isolate and quantify each of the two effects. 

Table 5 compares the vote shares and the number of seats each party obtains in the 
experiment with the actual data under plurality rule. Firstly, the vote share for the DPJ 
and the LDP would be smaller under proportional representation. As we will confirm in the 
next counterfactual experiment, a large part of the decrease can be explained by the choice- 
expansion effect. Secondly, the vote share for the YUS would be larger in the counterfactual 
experiment. The fact that the YUS did not field candidates in many districts increased its 
vote share under the counterfactual through the choice-expansion effect (We find almost no 
sincere-voting effect in the next experiment for the YUS). 

As for the number of seats in the counterfactual experiment, we simply multiplied the 
vote shares of each party by the number of total districts (175). The difference between the 
actual and the counterfactual is even greater for the number of seats than for vote shares 


because votes are translated very differently into seats under plurality and proportionality. 
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JCP DPJ LDP YUS 


Actual 
Vote Share (%) (a4 38.4 49.7 35.0 
Number of Seats 0 35 131 9 
Counterfactual 
Vote Share (%) —_[8.39, 10.19] [40.60, 43.77] [42.63, 45.73]  [33.93, 38.77] 
Number of Seats [0, 0] [52, 75] [86, 111] [11, 18] 


Table 6: Counterfactual Experiment — Sincere Voting under Plurality Rule. Acutual vote 
share is computed by taking the average of the vote shares only over districts in which the 
party fielded a candidate. Thus, they do not add up to 100% (c.f., Table 5). 


5.3.2 Sincere Voting under Plurality Rule 


In our second counterfactual experiment, we investigate what the outcome would have been if 
all voters had voted sincerely under plurality rule. It is well known from Gibbard (1973) and 
Satterthwaite (1975) that there does not exist a strategy-proof voting mechanism (except 
for a dictatorial mechanism or a mechanism in which a particular candidate is never chosen 
under any circumstances). Even though a strategy-proof voting mechanism does not exist, we 
can simulate the sincere-voting outcome under any mechanism because we have recovered 
the primitives of the model. In this experiment, we compute the sincere-voting outcome 
under plurality rule. This exercise also enables us to isolate the sincere-voting effect as we 
discussed in the previous subsection. 

Table 6 compares the actual vote shares and the number of seats with those of the sincere- 
voting experiment (Note that the vote shares do not add up to 100% because the vote shares 
are computed by taking the average of the vote shares only over districts in which the party 
fielded a candidate). The details on how we obtained Table 6 are provided in Appendix D. 

We find that the number of seats for the DPJ and the LDP change significantly in spite of 
the fact that the extent of misaligned voting is small [2.4%, 5.5%]. The DPJ would add [17, 
40] seats and the LDP would lose [20, 45] seats. Compared to the relatively small change in 
the vote share, the change in the number of seats is considerable. Note that this difference 
in the number of seats is accounted for by misaligned voting. Even though the extent of 
misaligned voting is small, the impact on the number of seats is large because the winning 
margin is often small. 

With respect to vote shares, we find that the vote shares for the JCP and the DPJ 
increase while the vote share for the LDP decreases in our experiment. This is what we 
would expect given that the LDP candidates tended to be strong while some fraction of DPJ 


candidates and even a greater fraction of the JCP candidates were not. On the other hand, 


dl 


we find that the sincere-voting effect for the YUS is nearly zero. This implies that the gain 
in the vote share for the YUS in the previous counterfactual experiment is due mostly to the 
choice-expansion effect. Our findings also suggest that a large part of the decrease in the vote 
shares in the previous experiment for the LDP and the DPJ are due to the choice-expansion 
effect. Lastly, given that vote share for the JCP remains almost unchanged in the previous 
experiment, the choice-expansion effect and the strategic-voting effect for the JCP were of 


similar magnitude, but worked in opposite directions. 


6 Concluding Remarks 


In this paper, we study how to identify and estimate a model of strategic voting and quantify 
its impact on election outcomes by adopting an inequality-based estimator. Preference and 
voting behavior do not necessarily have a one-to-one correspondence for strategic voters, 
and we obtain partial identification of preference parameters from the restriction that voting 
for the least preferred candidate is a weakly dominated strategy. The extent of strategic 
voting is identified using particular features of general-election data. We also make a clear 
distinction between strategic voting and misaligned voting. 

By using aggregate data from the Japanese general election, we find that a large propor- 
tion of voters are strategic voters. We estimate the fraction of strategic voters to be [75.3%, 
80.3%], on average. A counterfactual experiment that introduces proportional representation 
decreases the number of votes for major-party candidates by a large margin, and the number 
of seats by an even greater margin. In the second counterfactual experiment, which assumes 
sincere voting by all voters under plurality, we find that the number of seats for the parties 
change significantly. Even though the extent of misaligned voting is small [2.4%, 5.5%], the 
impact on the number of seats is considerable because the winning margin is often small. 

One of the important issues that we did not deal with in this paper is voter turnout. 
Voters’ beliefs on pivot events are also important for models of voter turnout, and it may be 


possible to extend our approach in this direction. We leave this for future research. 
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7 Appendix 


7.1 Appendix A: Existence of Solution Outcome 


We provide a proof of the existence of the solution outcome. It is almost identical to the proof 
in MW. Take some « € (0,1). We define a mapping ® from the product space of vote shares 
V(= A**™) and tie probability T(= A*°?) to its power set 2”*7 so that the fixed point of 
the mapping is an element in the solution outcome. Before we define ®, let us first define 
®, to be a mapping from V 5 V = (Vy, ..., Vx) to 27: ®(V) = {Te TM > Vi > Tin > 
EeTi»Vk,l,n}. ®, is the set of tie probability that satisfy a stronger version of C1 (because 
é € (0,1)). ©, is non-empty valued, convex-valued and upper-hemi continuous. Now define 
&, to be a mapping from T to 2” as ®2(T) = {((Vim(T)) 1 M1} where Vim(T) is defined 
by C2. 2(T) is a singleton set. ®2 is also non-empty valued, convex valued and upper-hemi 
continuous. Now we define ®: V x T 3 (V,T) + ®(V,T) = (2(T), ®:(V)) € 2”*7. Then 
® is also non-empty, convex-valued, and upper-hemi continuous. By applying Kakutani’s 
fixed point theorem to ®, we know that there exists a fixed point of ®. As the fixed point 


satisfies C1 and C2, the solution outcome is nonempty. 


7.2 Appendix B: Estimation 


We use municipality-level aggregate data for our estimation. We denote the vote-share 
data of candidate & in municipality m by Digits We use f,, to denote the distribution of 
demographic characteristics x in municipality m. We let €, = (€n4)k_, denote the K draws 
of individual-candidate-specific shock, and we let g denote the distribution of ¢€,,. Similarly, 
denote €,,, = (Ejm)#_,. Lastly, candidate k’s characteristics are denoted by zpm. 

Recall that as in equation (3) we can express the vote share for candidate k in municipality 


m as a composition of vote shares among strategic and sincere voters: 
data ~, SIN(¢ . STR(rpd ; 
UR ~ (1 > Ohm) Vem (Sag Ao) [ mV m ies ee ’ Ao) (4) 
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where 


lee ete) = | | {tink > Un, Wi}g(e)de fin (x)dx 
Vem te ade) = / li Ltn(L > Un(T%, Vi} gle)de fin(x)dx, 


are the expression for the vote share for candidate k among sincere and strategic voters. 
Now, we construct moment inequalities based on the regression coefficients in each elec- 
toral district. 


Step 1 Take some z and some district d. We obtain Oe by regressing the vote share 


data (ug, ..., o/4/4,) on the demographics in each municipality (fi, ...fiza),*! ie., 


M4 
tt = argmin | 57 Anglo — 8 Fal? 
m=1 


Because M7 is not large, we cannot include many regressors. The number of regressors must 
be less than M7“. For this reason, we run 9 different types of regressions all involving just a 
constant or a constant and one component of f,,. For example, we run a regression of véa’ 
on a constant and the fraction of population above 65 years old conditioned on zpm, = LDP. 


The full set of regressions we use is in the Supplementary Material. 


Step 2 Fix some parameter 0, beliefs T¢, and values of ag = {am}M“, and €, = 


{€,,}“,. We can compute the vote shares for candidate k in each of the municipalities 


which we denote as (up 7"?(T4, a1, €,;9), -., up fia? (T%, asa, € ya} 0)). We can obtain a 


closed form solution for the predicted vote share of sincere voters because ¢€ is distributed 
type 1 exteme value. Regarding strategic voters, the predicted vote share does not have a 
closed form solution, and we use Monte-Carlo integration. For Monte-Carlo integration, we 
take 10 draws of ¢ for each demographic characteristics, x. As we group the voters into 32 


3 


types according to their characteristics x,°? we take 320 draws of ¢ for each municipality. 


31 As in footnote 28, we can identify the distribution of demographic characteristics f,, with a vector 
of probabilities. We use the same notation f,, to denote the distribution and the vector of probabilities. 
The vector f;, contains, for example, the fraction of the population above 65, the fraction of population in 
different income ranges, etc. 

32We discretize income into four groups, age into two groups, and education into four groups. Thus, we 
have 4 x 2 x 4 = 32 types. 
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Step 3 Parallel to Step 1, regress the simulated vote shares of candidate k, (up EP CT? oak eG) 


yPRED 
yore UL ga 


conditioning on a particular value of z. We obtain the regression coefficient as 


(T4, asa, €ysa; )), on the demographic characteristics in each municipality (f,, ...fxra), 


Mé4 
Beall Qa, 439) = ateeun pS Uferm=2}(Yem (Ts aya, Ea) — B> fn)” 
m=1 


Step 4 Because we do not know T%, we vary T? € T(v%) to obtain the minimum 
and maximum values of the regression coefficients 5, (@a,€4q; 4) and Bra( Qa, €q; 9) as in the 


daa) 


main text. In practice, we discretize T(v with a grid size equal to 0.04. 


Step 5 We integrate out ay and €, by simulating values of ag and €, from F, and Fe, 
and obtain 6, ,(@) and 8 ,q00), as defined in the main text. We draw 10 realizations of a, 


and €,, from F, and Fe, hence we have 10 x M*@ draws for each district d. 


Step 6 We take the average of 3, ,(), 8 


pirical analog as in the main text. 


: (9) and oer across d and obtain the em- 

Finally, to improve the sharpness of the identified set, we include another type of moment 
inequalities that harnesses the comovements in ( that results from varying 7. Notice that in 
Step 4, we have computed the maximum and the minimum values of 3 separately for each of 
the 9 types of regressions. But note that the coefficients from the regressions cannot move 
independently. Thus in an effort to use some of these restrictions, we can construct additional 
moment inequalities by taking linear combination of 3. For example, let Bae and Bee be 
the regression coefficients that we obtain in Steps 1 and 4 when we regress vote shares on the 
proportion of the population above 65 and the proportion of the population in the highest 
income quartile, respectively. Then we can consider max¢ray( ee (T?) — oe (T7)) and 
use this to form moment inequalities. More generally, for any matrix A, we can consider 
AByq = maxyra, AB, 4(T4) and AB, , = mingry AG, q(T“) and construct moment inequalities 
by following the same argument presented in the main text. We provide the exact form of 


matrix A that we use in our estimation in the Supplementary Material. 


7.3 Appendix C: Comuptation of Misaligned Voting 


The amount of misaligned voting is given by the fraction of voters who do not vote for the 
most preferred candidate. As we discussed in the main text, we do not have any individual 


voting records (we only observe vote shares at the municipality level), so we need to identify 
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the extent of misaligned voting from aggregate data. In Step 1, we discuss issues arising 
from identifying the extent of misaligned voting from aggregated data, assuming that we 
can precisely recover the outcome when everyone votes sincerely. Then, in Steps 2 to 4, 
we will discuss issues related to recovering the sincere voting outcome from the estimated 
model. 
Step 1 

Let vd? denote the actual vote share for candidate k and let vz" denote the vote share 
of candidate k when everyone votes sincerely (subscripts d,m are suppressed from now on). 
Also, let Dz; denote the total votes cast for candidate k by strategic voters who prefer 
candidate | most (inflow/outflow of misaligned votes from / to k). Then the object of 
interest, the amount of misaligned voting, can be expressed as 5> x Pri. On the other hand, 
the available information is summarized as vf? — vz!” = yo, Dai — 2) Dix, where 53, Dui is 
the inflow of misaligned votes into candidate k and )°, Dj, is the outflow of misaligned votes 
from candidate k. (Note that C1 implies that if D,,; > 0, then Dj, = 0.). The question we 
are concerned with is the following: What can we learn about 5°, Dy, given that we only 
know vf — vpn (= A*) = SO) Dat — 0, Dix? 

We can show that for K = 3, }>, Dj, can be bounded below by 


Ib({A*}) = max{|A"|} 


and above by 
ub({A*}) = max{A"} — min{A*}. 


We provide an analogous expression for K = 4 in the Supplementary material. These bounds 
are also sharp among all bounds that can be obtained without imposing any distributional 
assumptions on the shocks in the utility function.*? The proofs are provided in the Supple- 


mentary material. 


Step 2 to Step 4 

Now we discuss issues related to recovering the sincere voting outcome from the estimated 
model. Given preference parameters of the model, for any realization of €, we can compute 
what the outcome would be if all voters vote sincerely. We denote this predicted sincere- 
voting outcome as vn, £). Ideally, we would know the actual realization of €, € = & 


in each municipality, and compute the sincere voting outcome, v*’” (0, €,), using this actual 


33We do not know whether the bounds are sharp with regard to the class of DGPs that we considered in 
our estimation where we have imposed distributional assumptions on the unobservable shocks in the utility 
function. As our estimation bypasses inference on T, it is difficult to obtain bounds that are, at the same 
time, computable and sharp with regard to the DGPs we considered in the estimation. 
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realization of €) and using a parameter value in the estimated set, (a Ocr. Then the 
difference between the observed vote share, v4* and vin, 65); (A? Sue vin, £0)) 
would allow us to compute the lower and upper bounds, lb({A*}) and ub({A*}). However, 
€, cannot be recovered uniquely. Also, the difference between v4 = v(€,) and vin (6, €) 
depends on 0, which we have only set-identified. Hence, we compute the bounds on the 
extent of misaligned voting in the following three steps (Step 2 to Step 4). 

In Step 2, fix 0 € Oc;. For any given draw of € from Fe, we compute A*(€), 


~ 


A*(£) = vim" — v2i9(G, €) 


and the corresponding bounds Ib({A*(€)}) and ub({A*(E)}). By Monte Carlo, we then 
compute the expected value of the bounds where the expectation is taken with regard to the 


randomness in &, 


Ly = f wCAME))aFe(g), and 
Um =f w{Be)ar@), 


for each municipality, where Fy is the estimated distribution of €. Note that Lbo and Ubo do 
not necessarily coincide with 1b({A*(€,)}) and ub({A*(€,)}), which are the lower and upper 
bounds of the extent of misaligned voting we would obtain if we had full knowledge of the 
realizations of €, € = €). Therefore, we need to account for the parts of Lbp and Ubo that 
are induced by the randomness in €. We discuss this in Step 3. 

In Step 3, we evaluate the components of Lbp and Ubp that are induced by the randomness 
in €. To do so, we compute the mean effects of the randomness components by calculating 
(using Monte Carlo integration) 
| [GRE Op ahe@ar@, ana 


i; ub({A*E, @)})déte(@)dk@), 


Lhe 


Ube 


where A*(E, €) is the difference in the vote share between two realizations of municipality- 
level shock, é and g, i.e., 7 7 
A*(E,€) = up."(0,€) — v¢"(6, €). 


We then compute the lower and upper bounds of misaligned voting at the municipality level 
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as 


LB = Lbo — Lhe, and 
UB Ubo — Ube. 


In Step 4, we account for the fact that @ is only set-identified. So far, we have been 
computing LB and UB implicitly treating @ as given. By denoting the dependence on 
6 more explicitly, 2B and UB above can be written as LB(0) and UB(0). Because @ is 
partially identified, we need to compute L.B(0) and UB(6@) by allowing 6 to move in the 
partially identified set Oc; in order to construct the most conservative bound on the extent 


of misaligned voting, LB and UB, i.e. 


DB = min LB(O), and 
0€Ocr 

UB = max UB(@). 
olen s 


7.4 Appendix D: Comupation of Second Counterfactual 


Computation of the second counterfactual proceeds in the same way as described in Steps 2 
to 4 in Appendix B. This is because as was the case in our first counterfactual, we cannot 
recover the realization of the municipality level random shock €,€ = €). Denote the coun- 
terfactual vote share as v*"” (0, €)). The problem is that we cannot compute this because &, 
is unobserved. But we can obtain bounds for vn, £,) by following the same procedure as 
in Appendix C. We can also compute the number of seats in the same way. Note that we do 


not need to take Step 1 in this case. 


AO 


8 Supplementary Material 


8.1 Supplementary Material A 


In our estimation, we run regressions in Step 1 and Step 3 in order to obtain Bee and 


Beall, Qa,§ 39), which are 


M4 
fea (Dra, €459) = argmin | Vezq=2}(Ytm — 8+ fm)? | » and 
m=1 


M4 
Beall”, Qa, £4; ) arg mm S- ieee ar Ave, Ewa; ) = B (ep 
m=1 


We run 9 different types of regressions (fourty eight regressions in total) for each district as 
follows. 

1. Regressing the vote share onto a constant and the fraction of population above 65 years 
old, i.e. fn = (1, “fraction of population above 65”). If we let P denote {LDP, DPJ, JCP, YUS}, 
we run this regression for each combination of (270°, ..., 229°) € P*®. 

2. fm is a constant and the fraction of population with years of schooling between 12 to 
14 years. Regression is run for each combination of (zP?°,..., 220°) € P*. 

3. fm is a constant and the fraction of population with years of schooling between 15 to 
16 years. Regression is run for each combination of (270°, ..., 22°) € P*®. 

4. fm is a constant and the fraction of population with years of schooling over 16 years. 
Regression is run for each combination of (zP?°, ..., 220°) € P*. 

5. fim is a constant and the fraction of population with income in the first quartile (lower 
than 1.870 million yen). Regression is run for each combination of (zP?°, ..., 220°) € P*. 

6. fm is a constant and the fraction of population with income in the second quartile 
(between 1.870 million yen and 2.704 million yen). Regression is run for each combination 
Ol POR 2 BECP We Ps, 

7. fm is a constant and the fraction of population with income in the third quartile 
(between 2.704 million yen and 3.911 million yen). Regression is run for each combination 
OF (OR 2 BEC We PE, 

8. fm is a constant. Regression is run for each combination of (z?0°,..., 229°) € P*. 

9. fm is a constant. Regression is run for each combination of (2%, ..., 20°) € P¥, 


EXPR ,HOME EXPR 
and (700° "27°" "| where g.- 


€ {incumbent, previous political experience, no previous 
political experience}, and z#0”" € {hometown of the candidate is outside the prefecture, 


hometown of the candidate is inside the prefecture (but outside the distrct), hometown of 


Al 


the candidate is in the district (but outside municipality m), hometown of the candidate is 
in municipality m}. 

In order to improve the sharpness of the identified set, we include another type of moment 
inequalities that harnesses the comovements in (@ that results from changes in T as dissussed 
in Step 6 of Appendix B. We augment the moment conditions by using restrictions on the 
comovement of the coefficients for the 9th type of regressions. This allows us to add restric- 
tions on the pairwise difference in the (s that relate to the effect of candidates’ experience 
and hometowns, e.g., the difference in the vote share for a LDP candidate whose hometown 


is outside of the prefecture compared to a LDP candidate whose hometown is within the 


prefecture. In practice, the matrix A used in Step 6 in our estimation is AT = [ I¢o : 
i we. 2. TT i 0 
-1 0 0 1 O 
where B = 0 -1 *. : -1 0 0 1 -:-: and Io is the identity matrix 
a 0 0 0 -l1 0O 
0 0 -1l —-1 0 


of size 60 x 60. 


8.2 Supplementary Material B 


In this Supplementary Material, we prove that the bounds ub({A*}) and lb({A*}) we have 
used in Appendix C in fact constitute bounds and that they are sharp. Because the bounds 
are different for K = 3 and K = 4, we prove each case in turn. We drop subscripts d and m 


for the rest of the section. 


8.2.1 Case of kK =3 


First, we prove that, for the case of K = 3, the extent of strategic voting is bound by 
Ib({A*}) and ub({A*}), where 


Ib({A*}) = max{|A*]}, and 
ub({A"}) = 1{#{A* > 0} = 2} (max{A*|A® a min{A*}) 

+ 1{#{A* > 0} = 1}(max{A"} _ min{A*|A* < 0}) 
max{A*} — min{A*}, 
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and #{A* > 0} indicates the number of A*s that are positive, and 1{-} is an indicator 
function. Let D,; denote the votes cast for candidate k by strategic voters who prefers 
candidate / most. Then the amount of misaligned voting is }°>,; Dm (Note that C1 implies 
that if Dy > 0, then Diy, = 0.). 

First, we prove that the extent of strategic voting is bound by /b({A*}) and ub({A*}). 
Without loss of generality, index the candidates as 1, 2, and 3 such that the beliefs regarding 
the tie probabilities satisfy Tj2 > T)3 > To3. Then the amount of misaligned voting is 
D = Di. + D3 + Do3 (Note that Do, = D3, = D32 


0.). Now, we can write 


At Di. + Dis, (A1) 
A? Dog — Dra, (A2) 
Ae —Dj3 — Do3. (A3) 


Note that |A1| + |A?| = Dy + 2D13 + Do3 > D, thus |A*| + |A®| is an upper bound. We 
consider two cases; (i) {#{A* > 0} = 1}, and (ii) {#{A* > 0} = 2}. In case (i), we know 
that the positive number we observe is A!, but cannot identify which of the two negative 
numbers correspond to A? or A?. In case (ii), we know that the negative number we observe 
is A°, but we cannot identify which of the two positive numbers correspond to A! or A?. 


These two cases are exhaustive as A' + A? + A? = 0. In case (i), 


ub({A*}) = max{A"} — min{A*|A* <0} = At — min{A?, A*} 
= |A‘| +max{|A*, |A°]} 
2 (Ai lsriAtL 


In case (ii), 


ub({A*}) = max{A"|A‘ >O0}- min{A*} = max{A!, A?} — A® 
= max{|A"], |A?} +15 
> |A*|+|A?). 


We can also see that max,{|A*|} is the lower bound because |A'| = Diy + Diz < D, 
|A?| < Do3 + Dig < D, and |A?| = Di3 + Do3 < D. 
Second, we prove by contradiction that the upper bound ub({A*}) is sharp. Let h(A?, A?, A’) < 
ub({A*}) for all {Aj}, and moreover h(A’*, A*, A®*) < ub({A*}). Without loss of gener- 
ality, consider the following two cases (i) A* > 0 > max{A”*, A®*} and (ii) minfA™, A*} > 


0 > A**. Note that we cannot identify whether the two negative numbers in case (i) corre- 
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spond to A** or A®*, and similarly, in case (ii), we cannot identify whether the two positive 
numbers correspond to A!* or A*. This is the reason why we use the min and the max op- 
erators. In case (i), if we let Dig = A, Do3 = — min{A*, A*} and Dj3 = 0, then the three 
equations (A1)-(A3) can be satisfied. In this instance, D,2+ Dj 3+ Do3 =A —min{A**, A®*} 
=ub({A**}), achieving our bound. Hence, h cannot be an upper bound. In case (ii), let 
Dy = max{Al, A*}, D3 = 0, Do3 = —A®*. Then (A1)-(A3) are satisfied, and moreover, 
Dy + Dy3 + Doz =max{A, A*} — A®* =ub({A*}). 

Third, we prove by contradiction that the lower bound /b({A*}) is sharp. Let h(A!, A?, A?) > 
Ib({A*}) for all {Aj,,,}, and moreover h(A™, A**, A**) > 1b({A*}). Without loss of general- 
ity, consider the following two cases (i) A'* > 0 > max{A”*, A**} and (ii) min{A!*, A*} > 
0 > A®**. In case (i), let Dyg = —A*, D3 = —A**, and D23 = 0. This satisfies the three 
equations (A1)-(A3) and moreover, Dj2 + Di3 + Do3 = —A®* — A®* = A = 1b({A**}). In 
case (ii) let Dig = 0 and Do3 = A* and Di3 = —A** — A**. This also satisfies equations 
(A1)-(A3), and implies Dy + D3 + Do3 = —A®* = lb({A**}). Thus, h cannot be a lower 
bound. 


8.2.2 Case of K =4 


For the case of K = 4, the lower and upper bounds /b({A*}) and ub({A*}) are written as 


lb({A*}) = 1{#{A* > 0} = 3} max {in 4 A'A® Al> 01, —min{A*|A* =< oy} 


+ 1{#{AF > 0} = 2} max {min{ At|A! > O},—min{A*|A¥ < oy} 


+ 1{#{AF > 0} = 1} max {max{A*}, —max{A*|A* < oy} , and 
ub({A*}) = max{2A" AN = max{A*|A* < 0} 


The proof is similar to the case of K = 3. 
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